
Europaisches 
Patentamt 



European 
Patent Office 



Office europ^en 
des brevets 




Bescheinigung Gertificate 



Attestation 



Die angehefteten Unter la- 
gen stimmen mit der 
ursprunglich eingereichten 
Fassung der auf dem nach- 
sten Blatt bezeichneten 
europaischen Patentanmel- 
dung Gberein. 



The attached documents Les docunrtents fixes a 
are exact copies of the cette attestation sont 
European patent application conformes ^ la version 
described on the following initialennent deposee de 
page, as originally filed. la demande de brevet 

europ^en sp6cifi6e ci la 
page suivante. 



Patentanmeldung Nr. Patent application No. Demande de brevet n** 

99480050.6 



iN3i/\inooaAiiiioiad 
JO Adoo Qmmo 



DEN HAAG,DEN 




Der President des EuropSischen Patentamts; 
Im Auftrag 

For the President of the European Patent Office 
Le President de I'Office europeen des brevets 

P.O. 




C, PASTUREL 



AvaHable Copy 




Europaisches 
Patentamt 



European 
Patent Office 



Office europeen 
des brevets 



Blatt 2 der Bescheinigung 
Sheet 2 of the certificate 
Page 2 de l^attestation 



Anmeldung Nr.: 
Application no.: 
Demande n*: 



99480050.6 





Date de depot: 



Anmelder: 
Applicant(s): 
Demandeur(s)- 



INTERNATIONAL BUSINESS MACHINES CORPORATION 
Armonk. NY 10504 
UNITED STATES OF AMERICA 



Bezeichnung der Erfindung: 
Title of the invention: 
litre de Tinvention: 

Hardware device for processing the tasks of an algorithm In parallel 



In Anspruch genommene Pricriat(en) / Priority{ies) claimed / Prtorite(s) revendiquee(s) 

Staat: Tag: Aktenzeichen: 

State: Date: File no. 

Pays: Date: Numero de depot: 



Internationale Patentklassiftkation: 
International Patent classification: 
Classification Internationale des brevets: 



Am Anmeldetag benannte Verlragstaaten: 

Contracting states designated at date of filing: AT/BE/CH/CY/DE/DK/ES/FI/FR/GB/GR/IE/IT/LI/LU/MC/NtyPT/SE 
Etats contractants designes lors du depot: 

Bemerkungen: 

Remarks: 

Remarques: 



EPA/EPO/OEB Form 1012 - 04. 9B 



/ 



THIS PAGE BUMKiusnoi 



HARDWARE DEVICE FOR PROCESSING THE TASKS OF AN ALGORITHM IN 

PARALLEL 

Technical field 

5 The invention relates to the processing of algorithms used in 
particular in the search engines of a large data communication 
network such as Internet, and relates particularly to a 
hardware device for processing the tasks of an algorithm in 
parallel . 




Background 



The World Wide Web (WWW) provides a large source of 
information. Compared with traditional databases, Web 
information is dynamic and structured with hyperlinks. Also, it 
can be represented in different forms and is globally shared 
15 over multiple sites and platforms. Hence, querying over the WWW 
is significantly different from querying the data from 
traditional databases, e.g. relational databases, which are 
structured, centralized and static. It can cope with tens of 
information sources; but it is ineffective for thousands. 

Most Web documents are text-oriented. Many relevant information 
is usually embedded in the text and could not be explicitly or 
easily specified in a user query. To facilitate Web searching, 
many search engines and similar programs have been developed. 
Most of these programs are database based meaning that the 
25 system maintains a database, a user searches the web by 
specifying a set of keywords and formulating a query to the 
database- Web search aids are variously referred to as 
catalogs, directories, indexes, search engines, or Web 
databases . 

30 A search engine is a Web site on the Internet which someone may 
use to find desired Web pages and sites. A search engine will 
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generally return the results of a user's search ranked by 
relevancy. 

A competent Web search engine must include the fundamental 
search facilities that Internet users are familiar with, which 
include Boolean logic, phrase searching, truncation, and 
limiting facilities (e.g. limit by field). Most of the services 
try more or less to index the full-text of the original 
documents, which allows the user to find quite specialised 
information. Most services use best match retrieval systems, 
someones use a Boolean system only. 

Web search engines execute algorithms having internal processes 
which are repetitive tasks with independent entry data. A 
classical step by step processing of all processes and 
decisions on one entry data before processing the next entry 
data has the drawback to take much time to process all the 
data. Thus, it is the case to perform the search of a pattern 
within each file of a disk. The main repetitive processes to 
perform are : load file, open file, scan each word and compare 
for matching with a pattern, append the result in a temporary 
file, close file. 

One way to improve the performance, and in particular to 
improve the search response time, is to achieve parallel 
processing by parallelizing the search mechanism in the 
database or index table. Such software parallelization will be 
more optimized but is nevertheless limited insofar as the 
software processing, even if parallelized, requires a minimum 
of time which cannot be reduced. 

Summazy of the invention 

Accordingly, the object of the invention is to provide a 
hardware assist device able to run a set of repetitive 
processes using local pipelining for each task, and maintaining 



FR 9 99 027 



2 



a relationship between the parent task and the child task for 
each occurrence in the pipeline. 



Another object of the invention is to provide a hardware device 
for processing the tasks of a search algorithm in parallel 
5 wherein each specific task of the search is made by a dedicated 
processor . 

The invention relates therefore to a hardware device for 
processing the tasks of an algorithm of the type comprising a 
number of processes the execution of some of which depending on 
^ binary decisions, the device comprising a plurality of task 
units which are each associated with a task defined as being 
either one process or one decision or one process together with 
the following decision, and a task interconnection logic block 
connected to each task unit for communicating actions from a 

15 source task unit to a destination task unit, each task unit 
including a processor for processing the steps of the 
associated task when the received action requests such a 
processing and a status manager for handling the actions coming 
from other task units and building the actions to be sent to 

20 other task units 

9 Brief description of the drawings 

The above and other objects, features and advantages of the 
invention will be better understood by reading the following 
more particular description of the invention in conjunction 
25 with the accompanying drawings wherein : 

Fig. 1 represents an example of algorithm composed of three 
processes and three decisions. 

Fig. 2 represents the algorithm illustrated in Fig. 1 which 
has been structured into several tasks to be executed by the 
30 hardware device according to the invention. 

Fig. 3 is a block-diagram representing the hardware device 
according to the invention. 
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Fig 4 is a representation of the configuration register 
used to control each task executed by the hardware device of 

''""'Fig 5A and 5B are tables representing respectively the 

, -haeu of the algorithm 

actions to be executed by each task of the a g 

tllus^atea in n.. . in function o. tKe possi.i. activation 
sources for an instance and the following instance. 

ng 6 is a bloclc-diagram representing the connection 
.etween' the tas. inteconnection logic bloc, of ^- j-— ^ 
device of Fig. 3 and the different tas.s of the algorrth.. 

Detailed description of the invention 

The Simple algorithm illustrated in rig. 1 as an example 
includes three processes P., P. and P. and two decisions D. and 
; oepending on each decision, different functrons 
corresponding to the different paths in the algorithm .ay be 
run. The first function is represented b. the algorithm f ow 
When decision is «yes». that is when P—^/'^^f^;' ^ 
to be executed. The second function rs represented by the 

^ • ,.r, n is «no» and decision Dz is 
algorithm flow when decision D. iS <n ^^^^^^ 

i-h^i- is when processes Pi and P3 are to 
Tnlny the third function is represented by the algorithm 
"L decision is «no» and decision 

is when only process P. is to be executed. In "/^ 

i_ 1 i-hio ^ntrv point and tne same 

the algorithm flow ^^"^^ first 

« functions may be Z'^^^^;^^, ,,.,„,ion of 

algorithm flow, process P, is starte 

process P. is started again when decision iS «no. The secon 
Locution of P. starts only when the first execution of P. has 

execuTiio . n and D. have been completed, 

been completed and when decision Di and D^ 
30 Therefore, there is no overlap possible m a simple step by 
step processing of the algorithm. 

in Fia 1 is very simple, all 
Though the algorithm represented in Fig. X 

T ^^^=.11^/ rim in the same way. 
the algorithms are classically run m ^n^^ 

4 
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All the events (processes or decisions) of the algorithm flow 
have to be executed step by step although they are run 
repetitively with new entry data. The proposed invention allows 
to run separately the various processes and decisions to speed 
5 up the processing of the algorithm especially when there is no 
prior data required on some steps. The main idea to achieve 
this is to have one processor assigned to a «task» including a 
process, a decision or a combination of processes and decisions 
which will run all the repetitive instances of this task and 
10 will be linked to the execution result of the other task 
^ processors using a more detailed link information that the 
simple conventional link enabling the downstream tasks to be 
activated . 

Using the principles of the invention, the algorithm of Fig, 1 
15 can be divided into tasks as illustrated in Fig. 2. Four tasks 
are thus implemented. 

Task 1 (Ti) includes process Pa (no decision) 
Task 2 (T2) includes process P3 (no decision) 
Task 3 (T3) includes the sequential combination of 
20 process Pi and decision Di 

^ Task 4 (T4) includes only decision D2 (no process) 

According to the invention, each task is repetitively performed 
by one processor allocated to this task. Therefore, four 
processors will be required to run the example algorithm of 
25 Fig. 1 and Fig. 2. 

The hardware device according to the invention illustrated in 
Fig. 3 comprises as many task units 10, 12, 14 as the number of 
tasks included in the algorithm (Task 1, Taskz .... Taskn) . The 
interconnection between the tasks is performed by the 
30 intermediary of a Task interconnection logic block 16 as 
explained hereafter- 
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Each task unit like task unit 10 includes a processor 18 in 
charge of processing the sequential steps of the process, the 
decision or the combination of the process and the decision 
generally incorporated in the corresponding task. Actions 

5 received from other task units or sent to other task units by 
means of Task interconnection logic block 16 are managed by 
status manager 20 which is preferably a state machine. Status 
manager 20 is connected to processor 18 by two lines, an input 
line to processor 18 for starting (S) the task execution and 

10 the output line from the processor which is activated when the 
task is completed (C) . 

Status manager 20 has essentially two functions : input and 
output. The input function handles incoming commands from other 
tasks and the output function builds commands to be sent to 

15 other tasks. To perform these functions in conjunction with 
processor 18, several control/data registers 22, 24, 26 are 
used. Each control/data register corresponds, for this task, to 
an instance of the algorithm flow. The number of instances 
which can be run at the same time depends upon the pipeline 

20 capability of processor 18. Generally, it is necessary to have 
three control/data registers corresponding to instances m, m+1, 
m+2 . 

Each control/data register 22, 24 or 26 contains a control 
field and a data field. The control field is composed of three 
25 bits controlled by processor 18, a validation bit V, a 
completion bit C and a bit L/R indicating whether the output is 
Left of Right when the task includes a decision. 

The data field of a control/data register contains data which 
are loaded by status manager 20 after receiving an action to be 
30 performed from another task and before starting the task 
execution by sending the start command to task processor 18. 
These data may be used by processor 18. When the latter has 
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completed the task execution;, it may replace the data contained 
in the control/data register by other data. This data will then 
be sent to the destination task in the command word and used as 
an input field by the destination task processor. However, it 
5 must be noted that, in case of independent tasks, the data are 
not modified in the control/data register. 

When the task execution has been completed by processor 18, 
this one sets to 1 the bit C of the control field of the 
control/data register and a signal C may be sent to status 
10 manager 20. Therefore, either status manager is activated by 
^ the input signal C from task processor 18, or there is a 
polling or an interrupt mechanism which enables the status 
manager to be informed of the setting of bit C to 1. 

The commands which may be received from another task by status 
15 manager 20 are START, KILL or VALID. As already mentioned, the 
START command is used to activate task processor 18. The KILL 
command means that a task is no longer of interest since the 
taken decision is opposite to this task. Thus, a task which is 
the left path of a decision may be killed if the decision is to 
20 take the right path. When it receives a KILL command, status 
manager 20 clears the control/data register corresponding to 
^ the instance being considered as each command has as a 

parameter the instance value called level. Conversely to the 
KILL command, the VALID command confirms that the considered 
25 task corresponds to the taken decision. In such a case, the bit 
V of the corresponding control/data register is set to 1 by 
status manager 20. 

The output function of status manager 20 is to build commands 
based on the contents of two configuration registers, CONFIG. L 
30 28 and CONFIG. R 30 and also on the contents of the involved 
control/data register. The contents of CONFIG. L register which 
is selected when bit L/R set to 1 are given in Fig, 4. Note 
that the CONFIG. R register which is selected when bit L/R is 
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set to 0 has exactly the same structure as CONFIG. L register. 
Note that the CONFIG. L and CONFIG. R registers are loaded at the 
beginning of the algorithm processing and remain unchanged 
insofar as they contain data fields depending only on the 
5 algorithm structure. 

As illustrated in Fig. 4, CONFIG. L register contains a first 
block C selected when bit C is set to 1 and a second block V 
selected when bit V is set to 1. Each block C or V is used for 
two actions. For each action the register contains the three 
10 following fields wherein X = C or V and n = 1 or 2 . 

- Task Xn indicates which task should be activated 

- Axn indicates which action is to be performed. For 
example 00 = kill, 01 = start, 10 = valid and 

11 = valid + start. 
15 _ Lxn indicates the level of task (the instance) 

corresponding to Task Xn. For example, 00 = current 
level - 1, 01 = current level, 10 = current level + 
1, 11 = current levet + 2. 

As an example enabling the invention to be understood, the 
20 algorithm illustrated in Fig. 2 can be considered. There are 

four tasks Ti, Tz, T3 and T4 which can be executed, but there are # 
six activation sources since Task 3 and Task 4 have each two 
outputs. Furthermore, a task acting as a source task can 
activate a destination task in the same level or in the 

25 following level. Fig. 5A and Fig. 5B represent tables wherein 
the activation sources are associated with the columns whereas 
the tasks to be activated are associated with the rows. Fig. 5A 
corresponds to the activation of the tasks in a same level 
whereas Fig. 5B corresponds to the activation of the tasks in 

30 level m+1 by activation sources in level m. It should be noted 
that since only two levels are represented, this means that 
there is no relationship between the processes of the algorithm 
on more than two consecutive levels. 

FR 9. 99 027 8 ^ 



In the tables illustrated in Fig. 5A and 5B, only the cases 
corresponding to an action from an activation source to a task 
are filled with a letter. Letter S means «Start», V means 
«Validate» and K means «Kill». It must be noted that it is 
possible that a same source has an action on two tasks. Thus, 
T3R kills Task 1, and starts and validates Task T4 - 

As already mentioned, status manager 20 (Fig. 2) uses the 
control bits which have been previously loaded in CONFIG. L and 
CONFIG. R registers associated with the task. Thus, if we 
consider Task3 which generates two activation sources, the 
CONFIG. L and CONFIG. R registers have the following contents : 
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CONFIG . L 

1. Bloc C 

- Action 1 



- Action 2 



Task Ci 
ACi 
LCi 
none 



Task 3 
start 

current level + 1 



2- Bloc V 

- Action 1 



Action 2 



Task Vi 
AVi 
LVi 
none 



Task 1 
valid 

current level 
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CONFIG . R 

1. Bloc C 

- Action 1 



- Action 2 



Task Ci 
ACi 
LCi 
none 



Task 3 
start 

current level + 1 



30 



2, Bloc V 

- Action 1 



Task Vi = Task 1 
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AVi = kill 

LVi = current level 

- Action 2 Task = Task 4 

AV2 = valid + start 

5 LV2 = current level 

The Task interconnection logic block 16 is represented in Fig. 
6 Each task such as Taskl, Task2, Task3, ... Taskn is an input 
to Task interconnection logic block 16 but is also an output to 
this block. Each input action or command could be of the same 

10 type as each one of the output actions such as KILL, START or 
VALID using the CONFIG. L and CONFIG. R registers where an 
action is represented by three control fields Task Xn, Axn and 
Lxn, an action word may use this control fields in addition to 
the corresponding data (see Fig. 4) to transmit the action to 

15 the destination task. 

In the preferred embodiment illustrated in Fig. 6, the action 
word containing the control bits of CONFIG. L or CONFIG.R 
registers and data is input to a three-state driver 40, 42, 44 
or 46 where the Task Xn field is decoded in order to select on 
which bus this action word should be put. This word, or the 
remaining bits insofar as the Task Xn field is no longer used, 
are then decoded by the appropriate task to perform the 
requesting action. 

AS illustrated in Fig. 6, there are as many buses as the number 
of tasks. These buses are three-state so that all inactive 
inputs have no influence in the bus value. Only the valid one 
forced by the corresponding driver takes the bus for xts 
command. The width of the bus depends on the size of the act.on 
word, in the preferred embodiment the bus size is equal to word 
30 size. If there is a problem in the size of the bus, it is well 
known how to split the word into several blocks appended when 
sent on a smaller bus. The only drawback of this split will be 
an increased transmission latency as it will need several clock 

. .pj? 0 Q.g 027. 
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times to transmit a command or action from one output task to 
an iiiput task- At least, the TaskXn should be available in the 
first block of the split word to be decoded correctly. 



Each task can then put all the actions on the various buses. As 
5 long as there is no capability to have an action simultaneously 

put on the same bus by two tasks, there is no arbitration 

required. This is the case for most of the algorithms. 

Otherwise, an arbitration mechanism may be added on the control 

of each three-state driver to identify 2 simultaneous requests 
10 for the same destination. A very simple contention mechanism 

will for example give the priority on the destination bus to 

the lower source task. 
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1. Hardware device for processing the tasks of an algorithm of 
the type comprising a number of processes the execution of 
some of which depending on binary decisions ; said device 
comprising a plurality of task units (10, 12, 14) which are 
each associated with a task defined as being either one of 
said processes or one of said decisions or one of said 
processes together with the following decision, and a task 
interconnection logic block (16) connected to each task 
unit for communicating actions from a source task unit to a 
destination task unit, each task unit including a processor 
(18) for processing the steps of the associated task when 
the received action requests such a processing and a status 
manager (20) for handling the actions coming from other 

15 task units and building the actions to be sent to other 

task units 

2. Hardware device according to claim 1, wherein said actions 

communicated from a source task unit to a destination task 
unit are START used to activate the processor (18) of said 
20 destination task unit, KILL used to cancel the task 

^ associated with said destination task unit and VALID used 

to confirm that task associated with said destination task 
unit corresponds to a decision included in said task. 

3. Harware device according to claim 2, wherein said status 
25 manager (20) activates said processor (18) for processing 

the steps of the task associated with said destination task 
unit when the action received from a source task unit is 
START . 

4. Hardware device according to claim 3, wherein said status 
30 manager (20) is a state machine. 
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Hardware device according to any one of claxms 2 to 4, 
wherein each of said tas. units (10, 12, 14) furt er 
comprises a plurality of control/data registers (22 24 
26) each corresponding, for the task associated wxth saxd 
task unit, to an instance of the algorithm flow, each one 
of said control/data registers comprising a control field 
composed of a completion bit (C) set to 1 when the 
associated task is completed, a validation bit (V) set to 
when the associated task is validated and a L/R bit 
indicating that the output in the algorithm flow is left or 
right when said task includes a decision. 

Hardware device according to claim 5, wherein each of said 
control/data registers (22, 24, 26) includes a data fxeld 
which is loaded if necessary by said status manager (20) 
activated by an action received from a source task unxt 
said processor (18) using these data for executing the task 
associated with said task unit and replacing them xf 
necessary. 

Hardware device according to claim 6. wherein said 
completion bit (C) i= sent by said processor .18) to sard 
status manager (20) after completion of the tasR executron. 



8. 



9. 



Hardware device according to claim 5, 6 or 7. wherein said 
control/data register (22. 24 or 26, corresponding to a 
specific instance is cleared by said status manager 20 
When this one receives an action KILL for the tasR 
associated with said tasR unit and for said specxfic 
instance - 

Hardware device according to any one of claims 5 to 8, 
Wherein each one of said task units (10, 12, 
comprises two configuration registers CONFIG.L (28) and 
CONFIG.R (30) which are respectively selected by ^^^^--^ 
value of said bit L/R of the control/data register (22, 24, 
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26) of the instance being considered, the contents of said 
configuration registers being loaded at the beginning of 
the algorithm processing for defining the task to be 
activated, the action to be performed and the instance to 
5 be considered . 

10. Hardware device according to any one of claims 1 to 9, 
wherein said task interconnection logic block (16) is 
composed of three-state drivers (40, 42, 44) each one of 
said drivers being associated with one of said tasks as 

^0 input task and a number of buses equal to the number of 

said tasks as output tasks, one of said buses being 
selected by the driver corresponding to an input task after 
decoding an action word by said driver. 

11. Hardware device according to claim 10 and claims 5 and 9, 
15 wherein said action word contains the control bits of said 

configuration registers (28, 30) and data from said 
control/data registers of said task unit (10, 12 or 14) 
associated with the input task. 



1 Rri|ltp:i^^6^ 

3NSDOCI0: <E1 9948005CX)1> 



14 



THIS PAGE BLANK (usfto) 



FR 9 99 027 
Benayoun et al 

1/4 




FR 9 99 027 
Benayoun et al 
2/4 




CO 

d 



FR 9 99 027 
Benayoun et al 
3/4 



CONFIG REGISTER L 







BLOCK C 




t 
1 

1 

I 
t 
% 
% 

1 


BLOCK V 






|taskci 


AC1 


LC1 |tASKC2 


AC2 


LC2|tASKV1 


AV1 


LV1 |tASKV2 


AV2 


LV2 



ACTION 1 



ACTION 2 



ACTION 1 



ACTION 2 



FIG. 4 



LEVEL m to 
LEVEL m 


T1 


T2 


T3L 


T3R 


T4L 


T4R 


T1 






V 


K 






T2 










V 


K 


T3 














T4 








s.v 







FIG. 5A 



LEVEL m to 
LEVEL m+1 


T1 


T2 


T3L 


T3R 


T4L 


T4R 


T1 


S 












T2 




S 










T3 






S 


S 




V 


T4 















FIG. 5B 



FR 9 99 027 
Benayoun et al 
4/4 




HARDWARE DEVICE FOR PROCESSING THE TAKS OF AN ALGORITHM IN 

PARAIiI.EI. 

\ 



Abstract 




Hardware device for processing the tasks of an algorithm of the 
5 type comprising a number of processes the execution of some of 
which depending on binary decisions, the device comprising a 
plurality of task units (10, 12, 14) which are each associated 
with a task defined as being either one process or one decision 
or one processe together with the following decision, and a 

10 task interconnection logic block (16) connected to each task 
unit for communicating actions from a source task unit to a 
destination task unit, each task unit including a processor 
(18) for processing the steps of the associated task when the 
received action requests such a processing and a status manager 

15 (20) for handling the" actions coming from other task units and 
building the actions to be sent to other task units 
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